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ABSTRACT 

Wc present a general Bayesian formalism for the definition of Figures of Merit (FoMs) 
quantifying the scientific return of a future experiment. We introduce two new FoMs 
for future experiments based on their model selection capabilities, called the decisive- 
ness of the experiment and the expected strength of evidence. We illustrate these by 
considering dark energy probes, and compare the relative merits of stage II, III and 
IV dark energy probes. We find that probes based on supernovae and on weak lensing 
perform rather better on model selection tasks than is indicated by their Fisher matrix 
FoM as defined by the Dark Energy Task Force. We argue that our ability to optimize 
future experiments for dark energy model selection goals is limited by our current un- 
certainty over the models and their parameters, which is ignored in the usual Fisher 
matrix forecasts. Our approach gives a more realistic assessment of the capabilities of 
future probes and can be applied in a variety of situations. 

Key words: Cosmology - Bayesian model comparison - Statistical methods 



(N 
> 

in 
m 



1 INTRODUCTION 

As cosmology becomes increasingly dominated by results 
emerging from large-scale observational programmes, it is 
imperative to be able to justify that resources are be- 
ing deployed as effectively as possible. In recent years 
it has become standard to quantify the expected out- 
come of cosmological surveys to enable comparison, a 
procedure ex emplified by the Figure of Merit (FoM) in- 
troduced by iHuterer fc Turnen (|200ll ) and later used by 
the d ark energy task force (DE TF) for dark energy sur- 
veys jAlbrecht et al.ll2006l . lioogl ). Still in its infancy, how- 
ever, is the topic of survey design, where an exper- 
iment is optimized, within design or co st constraints , 
to generate the best scientific outcome (iBassett 2005 



Bassett. Parkinson fc Nicholl l2005l : [Parkinson et al 

20 id ) 



2007 



Both in quantifying and in optimizing survey capability, 
it is important to identify the scientific questions one hopes 
to answer. The DETF FoM measures the expected parame- 
ter constraints on a two-parameter dark energy model, using 
a Fisher matrix approach; this is an example of a parameter 
estimation FoM, in which the correct cosmological model is 
assumed to be known a nd the task is to estimate its param- 
eter values (see also e.g. iMortonson. Huterer fc HrJ (|2010t )). 
However, many of the most pressing questions in cosmology 
concern not parameters but models, i.e. the identification of 
the correct set of parameters to describe our Universe. Ex- 



amples are whether cosmic acceleration is due to a cosmolog- 
ical constant, quintessence, or modified gravity, and whether 
or not the Universe has zero spatial curvature. These are 
model selection questions, hence forecasts of the capabilities 
of future probes should be assessed by their power to an- 
swer such questions, rather than the more limited question 
of the error they will be able to achieve assuming a given 
model is true (i.e., the usual Fisher Matrix forecast). Al- 
ternative FoMs, which quantify the ability of experiments 
to answer m odel selection problems , have be e n prev iously 
discussed by iMukheriee et all (|2006l ). iTrottal (|2007bl ). and 
iTrotta et all (|201(]| )PI 

In this paper we present a comprehensive formalism for 
the construction of survey FoMs, incorporating both model 
and parameter uncertainty in light of the present observa- 
tional situation. In order to do so , we b uild on the method- 
ology introduced in iTrotta et all (|20ld ). We construct two 
new model selection FoMs, the decisiveness and the expected 
strength of evidence, which quantify the expected capability 
of an experiment to perform model comparison tests. For il- 
lustration we focus on the case of dark energy observations, 
though our formalism is broadly applicable. 



^ For an alternative, essenti a lly fr equentist, perspective on tiiis 
issue, see lAmara fc Kitchind 1 I2OIOI) . 



© 0000 RAS 



2 Trotta, Kunz & Liddle 



2 BAYESIAN FRAMEWORK FOR 
PERFORMANCE FORECASTING 

2.1 The expected utility of an experiment 

In order to build up towards the definition of our FoMs, we 
need to consider the different levels of uncertainty that are 
relevant when predicting the probability of a certain model 
selection outcome from a future probe. Those can be sum- 
marized as follows: 

• Level 1: current uncertainty about the correct model 
(e.g., is it a cosmological constant or a dark energy model?). 

• Level 2: present-day uncertainty in the value of the cos- 
mological parameters for a given model (e.g., present error 
on the dark energy equation of state parameters assuming 
an evolving dark energy model). 

• Level 3: realization noise, which will be present in fu- 
ture data even when assuming a model and a fiducial choice 
for its parameters. 

The commonly-used Fisher m atrix forecast (see, 
e.g. iTegmark. Tavlor fc HeavensI {im^ ) ignores the 
uncertainty arising from Levels 1 and 2, as it assumes 
a fiducial model (Level 1) and fiducial parameter values 
(Level 2). It averages over realization noise (Level 3) in the 
limit of an infinite number of realizations. Furthermore, in 
the Fisher matrix formalism the likelihood is approximated 
by construction as a Gaussian, which might be inaccurate 
for parameter spaces exhibiting curving degeneracies 
and/or multimodal distributions. Clearly, the Fisher matrix 
procedure provides a very limited assessment of what we 
can expect for the scientific return of a future probe, as it 
ignores the uncertainty associated with the choice of model 
and parameter values. 

The Bayesian framework allows improvement on the 
usual Fisher matrix error forecast thanks to a general proce- 
dure which fully accounts for all three levels of uncertainty 
given above. This will allow us to define a new type of FoM 
which represents in a more realistic way the uncertainties 
involved in making predictions. 

Following iLoredcl (|2003h . we think of future data Df as 
outcomes, which arise as consequence of our choice of ex- 
perimental parameters e {actions). For each action and each 
outcome, we define a utility function U{Df, e). Formally, the 
utility only depends on the future data realization Df. How- 
ever, as will become clear below, the data Df are realized 
from a fiducial model and model parameter values. There- 
fore, the utility function implicitly depends on the assumed 
model and parameters from which the data Df are gener- 
ated. The best action is the one that maximizes the expected 
utility, i.e. the utility averaged over possible outcomes: 



bution is given by 

p{Df\e,d) = / Aep{Df,e\e,d) 



Aep{Df\9,e,d)p{e\e,d) 

Aep{Df\e,e)p{e\d), 



(2) 



where the last line follows because p{Df\8, e, d) = p{Df\9, e) 
(conditioning on current data is irrelevant once the parame- 
ters are given) and p{0\e, d) — p{0\d) (conditioning on future 
experimental parameters is irrelevant for the present-day 
posterior). So we can predict the probability distribution 
for future data D/ by averaging the likelihood function for 
the future measurement (Level 3 uncertainty) over the cur- 
rent posterior on the parameters (Level 2 uncertainty). The 
expected utility then becomes 

SU{e) = J dep{e\d) J ADfp{Df\e,e)U{Df,e). (3) 

So far, we have tacitly assumed that only one model 
was being considered for the data. In practice, there will be 
several models that one is interested in testing (Level 1 un- 
certainty), and typically there is uncertainty over which one 
is best. This is in fact one of the main motivations for design- 
ing a new dark energy probe. If TV models {M\, . . . , A^jv} 
are being considered, each one with parameter vector 9i 
{i — 1, . . . , A''), the current posterior can be further extended 
in terms of model averaging (Level 1), weighting each model 
by its current model posterior probability, p{A4i\d), given 
by 



p{Mi 



p{d\M,)p{M^) 
p{d) 



(4) 



where p{d\Mi) is the Bayesian evidence for model Mi, 
p{Mi) is the model's prior and p{d) a normalizing constant. 
Using Eq. Q, this gives the model-averaged expected utility 



N 

su{e) = Y,pi^^W J <ie,p{e,\d,M^ 



£U{e) 



I 



ADfp{Df\e,d)U[Df,e) 



(1) 



(5) 

X j ADfp{Df\e„e,Mi)U{Df,e,Mi). 

This expected utility is the most general definition of a FoM 
for a future experiment characterized by experimental pa- 
rameters e. As we show below, the usual Fisher matrix fore- 
cast is recovered as a special case of Eq. (O, as a re other 
FoMs t hat have bee n defined in the literature, e.g. iBassettI 
((20051 ): IWand (|2008l ) : lAmara fc Kitchinl Hoifl). Therefore 
Eq. ((SJ gives us a formalism to define in all generality the 
scientific return of a future experiment. This result clearly 
accounts for all three levels of uncertainty in making our pre- 
dictions: the utility function U{Df,e,Mi) (to be specified 
below) depends on the future data realization, Df, (Level 
3), which in turn is a function of the fiducial parameters 
value, 8i, (Level 2), and is averaged over present-day model 
probabilities (Level 1). 



Here, p{Df\e,d) is the predictive distribution for the future 
data, conditional on the experimental setup (e) and on cur- 
rent data (d) . For a single fixed model the predictive distri- 



2.2 Figures of Merit from expected utility 

The expected utility of Eq. ((5)1 provides the most general 
formalism for the evaluation of the scientific return of an 
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experiment. It reduces to previously used FoMs for spe- 
cific choices of priors and utility functions. For example, 
the DETF advocated using the inverse of the area of the 
future probe covariance matrix on the dark energy param- 
eters as a FoM quantifying the strength of the statistical 
constraints from the experiment. This FoM can be recov- 
ered by setting TV = 1 in Eq. ((5]) (only one fiducial model 
is considered), taking a Dirac delta function for the current 
posterior, p{9\d,M) = 5(0 — S*) (only the fiducial param- 
eter vector 9i, is considered), assuming no realization noise 
(or equivalently, averaging over many future data realiza- 
tions, so that p{Df\e,e,M) = 5{Df - D{e^,)), where D{9^) 
describes a no-noise data realization around the fiducial pa- 
rameter values, and defining the utility function as the deter- 
minant of the future Fisher matrix, evaluated at the fiducial 
parameter values, 6^*. 

An other exampl e is th e Gaussian linear model consid- 
ered byl^ottaeTal] l|2010l ). where the utility function was 
chosen to be the inverse of the marginal error on the pa- 
rameters of interest. It is a property of the Gaussian linear 
model that the error ellipse does not depend on the fiducial 
model nor data realization, but only on the design matrix 
(|Kunz. Trotta fc Parkinson! 120061 '). Therefore, in this case 
the integration over future data Df gives unit y in Eq. Q, 
and t he same expression is recovered as in [Trotta etalj 
(|2010j ). 

Mukheriee et al.l (|2006l ) defined two model selection 



FoMs, each of which considers two models, a cosmological 
constant model and a two-parameter dark energy model. 
One FoM asks for the strength with which the dark energy 
model will be excluded if the cosmological constant is cor- 
rect; the current posterior is therefore taken to be that model 
and the FoM is the Bayes factor (defined below) in favour of 
the cosmological constant. The other FoM is the opposite, 
quantifying whether the cosmological constant can be ruled 
out if the dark energy model is correct. The current posterior 
is now the dark energy model space, and the FoM measures 
in how much of that space the cosmological constant model 
could be excluded (for example, the inverse parameter area 
above a certain Bayes factor threshold, by analogy to the 
DE TF FoM above). 

iTrottal (|2007bh introduced a methodology to compute 
the predicted posterior odds distribution (PPOD) for a 
model comparison from a future experiment. A PPOD-based 
Figure of Merit is another special case of our general formal- 
ism: it is obtained from Eq. ((5]) by assuming no realization 
noise, p{D f\9,e, M) = S[Df — Z)(S*)), and adopting as util- 
ity function the tail probability of the Bayes factor obtain- 
able by a future probe. 

For a given experimental configuration e, the expected 
utility can be evaluated as follows: 

(i) Draw a uniformly-weighted sample for the fiducial 
value for the parameters, S,, from a Monte Carlo Markov 
chain distributed according to the present, model-averaged, 
posterior p{9\d) — J2iP{Mi\d)p{9i\Mi, d) (Levels 1 and 2). 

(ii) Generate pseudo-data Df for the future probe, as- 
suming 9i, as fiducial parameter values. 

(iii) Evaluate the utility function from the future data (to 
be defined below). 

(iv) Loop back to (i) and average the utility function over 
the so-obtained samples. 



In general, the above procedure is computationally very 
expensive, as it involves two nested averages, one over the 
fiducial parameters (step (i)) and one over future pseudo- 
data realizations (step (ii)). Furthermore, in the context of 
model selection oriented FoMs to be introduced below, the 
evaluation of the utility (step (iii)) requires the computation 
of Bayes factors from the pseudo-data, which again is costly. 
If one wanted to use Markov Chain Monte Carlo (MCMC) 
techniques, one would typically need ~ 10* samples in step 
(i), and another ~ 10''' samples to obtain a reliable estimate 
of the utility function in steps (ii) and (iii). Therefore, the 
typical number of likelihood evaluations required would be 
of order ~ 10^, which is at the limit of what can be achieved 
toda y unless one adopts highly accelerated inference meth- 
ods jPendt fc Wandeltll200g:lAuId. Bridges fc Hobsonll2008l : 



iFrommert et all l2010l : [Bridges et all bOld ). Therefore, we 
shall make some simplifying assumptions that reduce this 
computational burden very considerably. 

Firstly, we will consider only N = 2 competing mod- 
els. Secondly, we will work in the Gaussian likelihood ap- 
proximation, i.e., we will assume that both the present-day 
and the future likelihood are well approximated by Gaus- 
sian distributions. This is the same kind of approximation 
involved in the usual Fisher matrix forecast. The assumption 
of Gaussianity further allows us to side-step the pseudo-data 
generation step: for a given value of the fiducial parameters, 
9-t, the maximum likelihood estimate 9f from future data Df 
generated from 0* is distributed as a Gaussian with mean 0* 
and covariance matrix given by the inverse of the likelihood 
Fisher matrix for the future probe. As a consequence, we do 
not need to generate pseudo-data at all in step (ii), and we 
can instead work directly in parameter space, by drawing 9f 
directly from a Gaussian distribution centered on ^t. 

Having made the above simplifications, we now turn to 
using the expected utility to define two new FoMs based on 
model selection. 



3 FIGURES OF MERIT FOR MODEL 
SELECTION 

To assess the science return of proposed missions in terms 
of their model selection capabilities, we propose to adopt 
the expected utility of Eq. ((5]) as a FoM for experiment e, 
after defining an appropriate utility function U{Df,e,A4i). 
There are many ways to do this, and we introduce here two 
proposals. The first one is named decisiveness, and it gives 
the probability that the proposed experiment will achieve a 
decisive outcome for model selection. A good experiment 
should be as decisive as possible. A complementary ap- 
proach, named expected strength of evidence, is to compute 
by how much the experiment is expected to prefer one or 
other model on average. Again, a good experiment will be 
able to prefer one of the models strongly. 

In a two-way Bayesian model comparison, the key 
Bayesian statistic is the Bayes factor Boi, which is formed 
from the ratio of the Bayesian evidences of the two models 
being considered: 



p{d\Mo) 



(6) 



p{d\Mi)' 

where the Bayesian evidence is the average of the likelihood 
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Table 1. Empirical scale for evaluating the strength of evidence 
when comparing two models, Mo versus Ad\ (Jeffreys' scale). The 
rightmost column gives our convention for denoting the different 
levels of evidence above these thresholds. 



probe. Such a utility function is 



I In Boil Odds 



Strength of evidence 



< 1.0 
1.0 
2.5 
5.0 



^3 : 1 
~ 3 : 1 
~ 12 : 1 
~ 150 : 1 



Inconclusive 
Weak evidence 
Moderate evidence 
Strong evidence 



under the prior in each model: 



p{d\Mr)= de,p{d\e,,M,)p{e,\M,). 



(7) 



The Bayes factor updates the prior probability ratio of the 
models to the posterior one, indicating the extent to which 
the data have modified one's original view on the relative 
probabilities of the two models. The Bayes factor can be 
evaluated b y a gene ra l num er ical method such as nested 
samp ling dSkillind l2004l: iBassett. Corasaniti fc Kunj 



2004 



Parkiiison. Mukheriee fc Liddld l200d : 
Feroz. Hobson fc Bridged l2009l l. or^ if one model is 



nested wi thin the other , by th e Savage-Dickey density ratio 
(SDDR) (|Trottall2007al . l2008l '). The Bayes factor is usually 
interpreted on the Jeffreys' s cale shown in Table [T] (|jeffrevsl 
ll96ll : lGordon fc TrottalboOTi '). 



3.1 The 'decisiveness' Figure of Merit 

A 'decisive' experiment is one that is able to gather strong 
evidence in favour of one of the competing models. There- 
fore, its utility function is (1) if the Bayes factor it will ob- 
tain is below (above) the 'strong' threshold for the evidence. 
In i? = 5, see Table [T] (this level of evidence is sometimes 
called 'decisive', hence the name of the FoM). Therefore, we 
are led to the following utility function 



U{Df,e,M,) = 



1 if|lnSoi|>5 
otherwise. 



(8) 



where Soi is the Bayes factor between the two models, ob- 
tained by the future experiment e. The best experiment is 
the one that maximizes this quantity, i.e. the one whose 
probability of obtaining a strong model selection outcome for 
either of the models is maximized. We thus define the deci- 
siveness ^ of an experiment e as its expected utility, Eq. ((5]), 
with the utility function ([SJ. We note that as a Figure of 
Merit is especially resilient to the scatter in the Bayes factor 
coming from averaging over dat a realizations and the un - 
known fiducial parameter values (jjenkins fc Peacocklboilf l. 
In fact, our formalism takes this scatter into full account, 
and if too many realizations are scattered out of the 'de- 
cisive' region (e.g. due to large noise on the measurements 
from the future probe) then this will lead to a lower Fig- 
ure of Merit. Therefore, using & to optimize the design of 
an experiment is particularly useful to guard against this 
effect. 



3.2 The 'expected strength of evidence' Figure of 
Merit 

Instead of the discrete utility function above, we can adopt 
one that is more gradual in assessing the merit of the future 



U{Df,e,Mr) = (-l)MnBo 



(9) 



which describes the strength of the model selection result 
from the future probe. By plugging this utility function into 
Eq. ([5} , we obtain a FoM that we call the 'expected strength 
of evidence' and denote by S. The rationale is that for every 
given fiducial value of the parameters and for every data 
realization, the best experiment is the one that maximizes 
the support to the true model (i.e., the model out of which 
the data actually come from), even though it might be that 
the experiment in question is not strong enough to achieve 
decisiveness. 

The factor (—1)' in Eq. ([9} is to ensure that the util- 
ity only rewards support for the correct model; e.g. un- 
der the more complex model (A^i), we want to maximize 
— InSoi, the odds in favour of A^i. Bayes factors can oc- 
casionally favour the wrong model, e.g. if the true model 
were a dark energy model with w — —0.999, anything other 
than an extraordinarily precise experiment is likely to favour 
the more predictive cosmological constant model. Never- 
theless, support for the wrong model will happen only in 
a small parameter space region and will be overwhelmed 
when the average over the current posterior is carried out, 
making the above nearly equivalent to the simpler choice 
U{Df,e,Mi) — |lnSoi|. We have found in the dark energy 
application presented below that for all future dark energy 
probes the difference in the FoM between these two choices 
is less than about 5%, so in practice almost negligible. 

It might seem at first glance that an experiment that 
maximizes the expected strength of evidence is also one that 
minimizes the error ellipse in the parameter space of inter- 
est. If this was true, than the ranking of probes obtained 
with the expected strength of evidence would be the same as 
the one from the DETF FoM. However, coii sider the SDDR 
expression for nested models (|Trottall2007al ) : 



p{(t>\Df,e,d,Mi) 



\4>=4'o ' 



(10) 



where (p are the extra parameters of interest for the more 
complicated model, which reduces to the simpler model for 
(j) = 00- The odds against Mo are maximized when the 
marginal posterior on the extra parameters is as small as 
possible at the location in parameter space predicted by 
the simpler model. This means that maximizing — In Boi 
requires minimizing the posterior error along the direction 
connecting the fiducial value of {wo,Wa) to ( — 1, 0) (if we re- 
strict our consideration to the dark energy example, where 
<j) = {wo,Wa))- In other words, the expected strength of ev- 
idence FoM favours experiments that deliver error ellipses 
whose most tightly constrained principal direction points to- 
wards the location of the simpler model in parameter space, 
hence minimizing model confusion. If instead the data come 
from A4o, then the utility function requires that the height 
of the posterior at the location of the true model be as large 
as possible. Since the posterior is normalized, this requires 
the posterior to be as tightly constrained around the true 
value as possible, which is obviously desirable. 

To summarize, the decisiveness FoM G < & < 1 can be 
understood as an absolute scale measuring the model selec- 
tion capabilities of an experiment, with & = 1 denoting the 
maximum possible performance in terms of model compari- 
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son utility (i.e., an experiment that is guaranteed to achieve 
a decisive model selection result). On the other hand, many 
probes might still be interesting to build but may fall short of 
the achieving strong evidence anywhere in parameter space, 
hence such experiments would all have & — 0. Yet it is still 
a relevant question to try and rank them according to their 
merits. This can be done by looking at the expected strength 
of evidence, which always returns a non-zero value. There- 
fore, the expected strength of evidence S can be regarded 
as a relative scale of the capabilities of the probes. 



4 APPLICATION TO FUTURE DARK 
ENERGY PROBES 

We now apply our newly defined model selection FoMs to a 
set of representative proposals for future dark energy probes. 
We consider a ACDM model with dark energy in the form 
of a cosmological constant versus an evolving dark energy 
model where the equation of state is w{z) = wo+Waz/(i+z), 
described by the two parameters {wo, Wa)- This is a case of 
nested models, i.e., where the simpler model (the cosmolog- 
ical constant) is obtained as a special case of the evolving 
dark energy model by setting wo — —l,Wa ~ 0. The other 
cosmological parameters (common to both models) are the 
baryonic density, the dark matter density, the spatial cur- 
vature, the amplitude of scalar adiabatic fluctuations and 
the spectral index of perturbations. We include curvature 
in our analysis as this impacts s trongly on the constraints 
on evolving dark energy models (|Wane: fc Mukheriee|[2007l : 



IClarkson. Cortes fc Bassettll2007l 'l 

The current posterior is obtained 

lowing d ata sets: WM AP5 dOunklev et all 
Acbar07 JKuo et al.l 120071) . CBI l|Sievers et all 



using 



the fol- 



BQQMER ANG03 (Ijones et al.| 



LRG DR4 jTegmark et al.ll2006l) for P(fc), the Hubbl e Key 



200£), 



20071 ). 



20061 ) for the CMB, SDSS 



Project determination of Ho (jFreedman et al.l 200 ll), bi g 
bang nucleosynthesis limits on Q,th^ ( Kirkman et al.ll2003l). 
and the Union supernova-la compilation ( Kowalski et al.l 
I2OO8I ). The priors on the common parameters are irrelevant 
as they cancel from the Bayes factor between the two 
models (as long as those priors are sufficiently wide to 
include the maximum likeli hood and unc orrelated with 
the dark energy priors, see iTrottal (|2007al )). so the only 
important prior is the one on {wo,Wa)- We choose a 
Gaussian prior centered on wo = —IjWa = with Fisher 
matrix fl = diag(l,l/2). With this prior and the above 
data sets, we obtain a Bayes factor Boi — 13.7 in favour of 
the ACDM model (representing moderate evidence against 
an evolving dark energy). This means that 93% of samples 
from the current posterior will be drawn from a ACDM 
model, and 7% from a model with evolving dark energy. 



4.1 Future dark energy probes 

We use a sele ction of future missio ns based on the DETF 
classification (|Albrecht et al.ll2006l ). using Fisher matrice s 
provided by the DETFast package (|Dick fc Knoxl l2006t ). 
This package provides only the Fisher matrices evaluated 
at a fixed fiducial ACDM cosmology, so we have to assume 
that the Fisher matrices do not vary significantly for dif- 
ferent fiducial parameters drawn from the current posterior. 



In other words, we take the Fisher matrix for the future 
experiment at a fiducial ACDM point and translate it in 
parameter space, without recomputing it for each new sam- 
ple of 6^*. This is clearly an oversimplification, but since the 
dark energy parameters are the most important ones for 
this application, and since 93% of points drawn from the 
current posterior belong to the ACDM case, we expect that 
the results are not too strongly biased. We intend to study 
the impact of this assumption and to provide a more com- 
prehensive study of the power of future dark energy probes 
in future work, while using the simplified approach as an 
illustration of our new FoMs here. 

The Dark Energy Task Force has classified the dark en- 
ergy probes in stages, with stage II being those that are cur- 
rently ongoing or completed, stage III being medium-term 
projects and stage IV future large projects (optical large 
survey telescopes, 'LST', space-based missions, 'S', and the 
square kilometer array, 'SKA'). The probes that we consider 
here include weak lensing (WL), type-la supernovae (SN), 
Baryon Acoustic Oscillations (BAO), cluster counts (CL) 
and combinations of several probes (ALL). A suffix '-o' and 
'-p' denotes optimistic and pessimistic assumptions about 
systematic errors. The 'p' in the names of the stage III ex- 
periments signals the use of photometric redshifts while an 
's' is used for spectroscopic surveys (that tend to cover a 
much smaller area) . For further, detailed information please 
consult the DETF report. 

The utility function computation proceeds as follows. 
In order to evaluate the decisiveness, Eq. ((Sjl, and expected 
strength of evidence, Eq. we need the Bayes factor InSoi 
for the future experiment. This is obtained analytically via 
the SDDR formula, Eq. ([TOj: 



InBoi 



1, |n| 



FM-. 



(11) 



where <j> = {wo,Wa) are the dark energy parameters of in- 
terest, n is their prior Fisher matrix and is the marginal 
posterior Fisher matrix for 0. We have defined (j>o = (— li 0) 
and (j) is the posterior mean from both current and future 
data. This can be obtained as the 0-components of the pos- 
terior mean vector in the full parameter space, 

e ^ F'^iL'df + Le + neo). (12) 

In the above, L-^ is the future probe likelihood Fisher matrix, 
L is the current constraints Fisher matrix, 9o is the prior 
mean. Of is the future maximum likelihood location while 9 
is the present constraints' maximum likelihood point. The 
Fisher matrix from the future and present data, F, is given 
by 



F = L' + L + n. 



(13) 



The prior used in Eq. (Illf) is the same as the one 
adopted for the analysis of the present-day data. This is 
because the prior in the context of Bayesian model selection 
should be understood as representing the a prion plausi- 
ble parameter values under the model. Therefore, we do not 
update the prior to the posterior from the present-day infer- 
ence step when evaluating the future Bayes factor. The like- 
lihood is obtained from the Fisher matrix formalism, with 
the above-mentioned additional assumption that the future 
likelihood Fisher matrix is independent of the fiducial pa- 
rameter value adopted. 



© 0000 RAS, MNRAS 000, 000-000 



6 Trotta, Kunz & Liddle 



CO 1 

c 

CD 
> 

"o 

Q) 

Q 10" 



10" 



10 10^ 10^ 
DETF FoM 



D3 

c 
CO 

CO 
CO 
Q) 

> 
"co 
"o 

0) 
Q 



1 

6 

11 

16 
21 
26 
31 
36 



■ SN-IVS-O 
■ SN-IVLST-0 



■ ALL-SKA-O 
■ ALL-IVS-0 
■ ALL-LST-0 
■ WL-IVSKA-0 
■ ALL-IVS-p 
■ WL-IVLST-0 
■ WL-IVS-0 

■ ALL-llis-0 
■ ALL-LST-p 
■ ALL-lllp-0 
■ ALL-SKA-p 
■ ALL-llls-p 
■ WL-IVS-p 
■ ALL-lllp-p 
I WL-IVSKA-p 

■ WL-lllp-0 
CL-IVS-0 

- CL-lllp-0 



■ WL-IVLST-p 

IWL-lllp-p 



IWL-II 



■ BAO-IVSKA-0 
■ BAO-IVS-0 
BAO-IVSKA-p 
■ BAO-IVS-p 



■ CL-IVS-p 
■ CL-lllp-p 

■ SN-II 

■ BAO-IVLST-0 
■ SN-lllp-p 

■ BAO-llls-0 
■ BAO-llls-p 

■ BAO-IVLST-p 
■ BAO-lllp-p 

CL-II 



36 31 26 21 16 11 6 1 
DETF ranking 



8 



Z 6 



f ■ 



10 10^ 
DETF FoM 



10' 



CO 
CD 

o 

CD 

■g 
■> 

CD 



C52 
0) 

00 

T3 
0) 

O 
CD 
Q. 
X 
LU 



1 

6 

11 

16 
21 
26 
31 
36 



I SN-IVLST-O 



I SN-lllp-0 



■ SN-llls 

■ WL-II 

■ CL-IVS-p 
■ CL-lllp-p 
I BAO-IVLST-0 



■ ALL-SKA-O 
■ ALL-IVS-0 
■ ALL-LST-0 
■ WL-IVSKA-0 
■ ALL-IVS-p 
■ WL-IVLST-0 
■ WL-IVS-0 

■ ALL-llls-0 
■ ALL-LST-p 
■ ALL-lllp-0 
■ ALL-SKA-p 
■ ALL-llls-p 
■ WL-IVS-p 
■ ALL-lllp-p 
■ WL-IVSKA-p 

■ WL-lllp-0 
■ WL-IVLST-p 

i-o 

■ CL-IVS-0 
■ CL-lllp-0 

■ BAO-IVSKA-0 
■ WL-lllp-p 

■ BAO-IVS-0 



■ BAO-IVSKA-p 
■ BAO-IVS-p 



■ SN-II 

■ SN-lllp-p 

■ BAO-IVLST-p 
I BAO-lllp-p 

■ CL-II 



I BAO-llls-0 
I BAO-llls-p 



36 31 26 21 16 11 6 1 
DETF ranking 



Figure 1. Comparison of our model selection FoMs to the DETF FoM (left panels) and the ranking of dark energy probes derived from 
them (right panels). 



Some of the dark energy probes can achieve a very 
strong model selection in favour of an evolving dark en- 
ergy model in parts of the parameter space, often obtain- 
ing InSoi ^ —100. This would correspond to a detection of 
a non-constant equation of state at many sigma confidence 
level. However, we do not expect our Gaussian approxima- 
tion to the likelihood to hold true so far into the tails of the 
distribution. Therefore, in order to be conservative, we im- 
pose a floor at InBoi = —20 when computing the expected 



strength of evidence from Eq. ((Ojl: any value of InBoi below 
this floor is remapped to the floor value. 

4.2 Results 

The results for the future probes are presented in Table [2] 
and plotted in Fig. [TJ where we compare the DETF FoM 
with our new model selection FoMs. We notice that the deci- 
siveness FoM separates the sample into two distinct groups, 
those with ^ < 0.1 (single probes up to level III and several 



© 0000 RAS, MNRAS 000, 000-000 



Designing Decisive Detections 7 



Table 2. Results for FoMs of various dark energy probes. is 
the decisiveness given in Eq. Js} and ^ is the expected strength 
of evidence, Eq. JSJl. 
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pessimistic single probes at level IV, together with BAO- 
IVS-o) that are unlikely to provide a decisive answer to the 
question whether dark energy is dynamical or not, and the 
rest with i^>0.1. This division is not apparent in (S" and 
the DETF FoM, and it leads to critical values of <f « 4 and 
DETF FoM « 70 below which an experiment is unlikely to 
obtain a strong model selection result. 

The ranking of the experiments between & and S is al- 
most the same, while the DETF FoM gives a similar but not 
always identical ranking. Looking at the right-hand panels 
in Fig. [1] we notice that the WL and SN probes tend to 
lie above the trend line (are more likely to provide a deci- 
sive model selection result than would be expected from the 
DETF FoM) while spectroscopic BAO probes lie below. 

In Figure [2] we show the distribution of In(Boi) for 
10^ outcomes for the ALL-SKA-o probe (the most power- 
ful probe considered here). The red bars on the right hand 
side are for data drawn from a ACDM model, for which this 
probe often but not always achieves a decisive outcome. The 




Figure 2. Histogram of ln(Boi ) values for the ALL-SKA-o DETF 
case. Red bars (those on the right) show cases drawn from a 
ACDM model (93% of cases according to current posterior) and 
the blue bar those with an evolving dark energy fiducial model 
(7% of cases, capped at In(Boi) = —20 as described in the text). 



blue bar on the left shows that the probe will deliver very 
powerful results if the dark energy is actually evolving (given 
the priors adopted, and current knowledge on dark energy 
parameters). It is not surprising that the model selection 
outcomes against ACDM tend be stronger than those that 
support it: it is always more difficult to strongly support a 
nested model, as the simpler model only "profits" from its 
predictiveness (thanks to the Occam's razor effect), but can 
never provide a better fit. 



5 CONCLUSIONS 

We have presented a general Bayesian formalism for the def- 
inition of FoMs encapsulating the expected scientific return 
of a future experiments. Our method fully accounts for all 
source of uncertainties involved in the prediction, including 
present-day model and parameter uncertainties, and real- 
ization noise. It thus improves on the usual Fisher matrix 
methods by producing more realistic forecasts for the possi- 
ble distribution of future experimental outcomes. 

We used this framework to define two Figures of Merit 
for probes that measure the dark energy equation of state in 
order to test the ACDM paradigm: the decisiveness 2l which 
quantifies the probability that a probe will deliver a decisive 
result in favour or against the cosmological constant, and 
the expected strength of evidence S that returns a measure 
of the expected power of a probe for model selection. We 
compared these quantities to the widely-used DETF FoM 
for a range of probes, and found that the rankings agree 
reasonably well, but that weak lensing and supernova probes 
have a higher than expected model selection power relative 
to their DETF FoM ranking. We also found, for our choice of 
prior, that there is a critical DETF FoM of around 70 below 
which probes are very unlikely to obtain a strong model 
selection result. 

An additional advantage of the formalism presented in 
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this paper, and of any Figures of Merit that use it, is the 
possibility to include further observations, for example those 
that constrain the growth history or the presence of effective 
anisotropic stresses. One just extends the likelihood based 
on the predictions of the underlying models, but the proce- 
dure is unchanged, and the interpretation of the results is 
unchanged as well. There is therefore no need to define new 
FoM's as data analysis goals for future probes evolve. 

The methodology presented here is widely applicable 
to a variety of forecasting and optimization problems. Our 
application to the model selection capabilities of future dark 
energy missions is but a first step towards a fully Bayesian 
approach to performance forecast. 
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